More than a Dozen Alternative Ways of Spelling Gini
نویسنده
چکیده
This paper surveys alternative ways of expressing the Gini mean difference and the Gini coefficient. It adds some new representations and new interpretations of Gini's mean difference and the Gini coefficient. All in all, there are over a dozen alternative ways of writing the Gini, which can be useful in developing applications to Gini-based statistics. Mailing Address: Department of Economics Hebrew University Jerusalem, 91905 Israel E-Mail – [email protected] Source: Yitzhaki, S.: More than a Dozen Alternative Ways of Spelling Gini, Research on Economic Inequality. 8, 1998, 13-30. 1 I would like to thank Peter Lambert for very helpful comments and a reference the Gini's original work in English. More than a Dozen Alternative Ways of Spelling Gini Gini's mean difference (GMD) as a measure of variability has been known for over a century. It was `rediscovered' several times (see, for example, David, 1968; Jaeckel, 1972; Jurečková, 1969; Olkin and Yitzhaki, 1992; Simpson, 1948) which means that it had been used by investigators who did not know that they were using a statistic, which was a version of the GMD. One possible explanation of this phenomenon is the large number of seemingly unrelated presentations of the Gini's mean difference (and other statistics that are derived from it), which makes it hard to identify which Gini one is dealing with. Being able to identify a Gini enables the investigator to derive additional properties of the statistic at hand and rewrite it in an alternative, more user-friendly way. It also enables the investigator to find new interpretations of the Gini and of Ginirelated statistics. One must be familiar with alternative definitions whenever one is interested in extension of the statistics at hand: as will become obvious later, some definitions are more amenable to such extension. Unfortunately, the alternative representations are scattered throughout many papers, spread over a long period and many areas of interest, and are not readily accessible. 2 For a description of its early development see Dalton (1920); David (1981, p. 192); Gini (1921, 1936), and several entries in Harter (1978). Unfortunately, I am unable to survey the Italian literature, which includes, among others, Gini's (1912) original presentation of the index. A comprehensive survey of this literature can be found in Giorgi (1990, 1993). This phenomenon seems to be a characteristic of the literature on the GMD from its early development. Gini (1921) argues " Probably these papers have escaped Mr. Dalton's attention owing to the difficulty of access to the publications in which they appeared." (Gini, 1921, p. 124). The aim of this paper is to survey alternative presentations of the GMD. As the survey is restricted to quantitative random variables, the literature on diversity, which is mainly concerned with categorical data, is not covered. For some purposes, the continuous formulation is more convenient, yielding insights that are not as accessible when the random variable is discrete. The continuous formulation is also preferred because it can be handled using calculus. To avoid problems of existence, only continuous distributions with finite first moment will be considered. The presentation is also restricted to population parameters, ignoring different types of estimators. It is assumed that sample values substitute for population parameters in the estimation. As far as I know, these alternative representations cover all known cases but I would not be surprised if others turn up. The different formulations explain why the GMD can be applied in so many different fields and given so many different interpretations. The Gini coefficient is the GMD divided by twice the mean income. Actually, it is the most well-known member of the Gini family and it is mainly used to measure income inequality. The relationship between the two is similar to that between variance and the coefficient of variation. Hence, one need only derive the GMD, and then easily convert the representation into a Gini coefficient. Some additional properties relevant to the Gini coefficient will be added later. It is worth mentioning that reference to "variability" or "risk" (most common among statisticians and For use of the GMD in categorical data, see the bibliography in Rao (1982) and Dennis et al. (1979) in biology, Lieberson (1969) in sociology; Bachi (1956) in linguistic homogeneity , and Gibbs and Martin (1962) for industry diversification. 5 One way of writing the Gini is based on vectors and matrices. This form is clearly restricted to discrete variables and hence it is not covered in this paper. For a description of the method see Silber (1989). finance specialists) implies use of the Gini mean difference (GMD), whereas reference to "inequality" (usually in the context of income distribution) implies use of Gini coefficient. The difference is not purely semantic or even one of plain arithmetic: it reveals a distinction in one's definition of an increase in variability (inequality). To see the difference, consider a distribution bounded by [a,b] and ask what is the most variable (unequal) distribution. If the most variable distribution is defined as the one with half of the population at a and the other half at b then the GMD (or the variance) is the appropriate index of variability. If the most unequal distribution is defined as the one with almost all the population concentrated at a and a tiny fraction at b, (all income in the hand of one person), then the appropriate index is the Gini coefficient (or the coefficient of variation). The structure of the paper is as follows: The next section derives the alternative presentations of the GMD; the third section adds some properties specific to the Gini coefficient. The fourth section investigates the similarity with variance. The paper concludes with a section indicating areas of further research. 2. Alternative Presentations of GMD There are four types of formulae for GMD, depending on the elements involved. The first type is based on absolute values, the second relies on integrals of cumulative distributions, the third on covariances, and the forth on Lorenz curves (or integrals of first moment distributions). Let X1, X2 be i. i. d. continuous random variables with F(x) representing the cumulative distribution and f(x) the density function. It is assumed that the expected value, μ, exists; hence limt→-∞ tF(t) = limt→∞ t[1-F(t)] = 0. 2.a: Formulations based on absolute values The original definition of the GMD is the expected difference between two realizations of i.i.d. variables. That is, the GMD in the population is: Γ = E {|X1 X2|} , (1) which can be given the following interpretation: Consider an investigator who is interested in measuring the variability of a certain property in the population. He draws a random sample of two observations and records the difference between them. Repeating the sampling and averaging the differences an infinite number of times yields the GMD. Hence, the GMD can be interpreted as the expected difference between two randomly drawn members of the population. A variant of (1) is: Γ = E { E{|X1 q|}|q = X2} . (2) The term E{X1 q} is the absolute deviation of X1 from q, where q is a quantile of X. The GMD is therefore the expected value of absolute deviations from quantiles of the random variable. In other words, the GMD is the average value of all possible absolute deviations of a variable from itself. A slightly different set of presentations relies on the following identities: Let x, y be two
منابع مشابه
Stanford Center for International Development Working Paper No. 447 Decomposition of Gini Coefficient Based on Axioms and a New Between-Subgroup Inequality Measure
Since Soltow (1960) first addressed the question of the Gini coefficient decomposition by population subgroups (hereinafter referred to as decomposition), there are more than a dozen different decompositions available in the literature. Whether the Gini coefficient is decomposable and, furthermore, whether the decomposition practices are arbitrary are debated in the literature, since there is n...
متن کاملThe Comparison of Typed and Handwritten Essays of Iranian EFL Students in terms of Length, Spelling, and Grammar
This study attempted to compare typed and handwritten essays of Iranian EFL students in terms of length, spelling, and grammar. To administer the study, the researchers utilized Alice Touch Typing Tutor software to select 15 upper intermediate students with higher ability to write two essays: one typed and the other handwritten. The students were both males and females between the ages of 22 to...
متن کاملA new index for measuring aging inequality: An application to Asian countries
Although the Gini coefficient is an ideal measure of income inequality, it may be applied to measure the aging inequality in a society. In this paper, an attempt has been made to develop alternative measures of aging inequality based on the Gini index. The study uses the secondary population data of Asian countries collected from the international data base, US census Bureau. From the analysis ...
متن کاملRunning head: INFLUENCES ON SPELLING 1 Influences on spelling: Evidence from homophones
Three experiments used homophones as a test case to examine the roles of phonology and morphology in the spelling process. We introduced university students to novel meanings of spoken forms, for example presenting /fid/ as a rare word for a type of furniture. We asked whether participants avoided spelling the new word as ‹feed›, instead using alternatives such as ‹fead›. Although participants ...
متن کاملDesign and implementation of Persian spelling detection and correction system based on Semantic
Persian Language has a special feature (grapheme, homophone, and multi-shape clinging characters) in electronic devices. Furthermore, design and implementation of NLP tools for Persian are more challenging than other languages (e.g. English or German). Spelling tools are used widely for editing user texts like emails and text in editors. Also developing Persian tools will provide Persian progr...
متن کامل